Stability Threshold as a Selection Principle for Protein Design 
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The sensitivity of the native states of protein-like heteropolymers to mutations modelled as 
perturbations in the interaction potential between amino acids is studied. The stability threshold 
against mutations is shown to be zero for random heteropolymers on a lattice in two dimensions, 
whereas a design procedure modelling evolution produces a non-zero threshold. We introduce an 
evolution-like protein design procedure based on an optimization of the stability threshold that is 
shown to naturally ensure thermodynamic stability as well. 

PACS numbers: 87.15. By, 87.10. +c 



Natural proteins are made up of sequences of 
amino acids that fold rapidly into specific compact 
structures corresponding to minimum free energy 
states called native states ]l|-[To|]. The structure of 
the native state conformation controls the function- 
ality of the protein. Because the number of possi- 
ble random amino acid sequences and the number 
of possible conformations are gigantic, an important 
issue is an understanding of the selection principles 
that apply to protein sequences and/or native state 
structures. 

The two key ingredients of evolution are diversity 
(afforded by the availability of 20 amino acids) and 
stability (a functionally useful sequence should not 
be mutated away). The stability of the occupancy of 
the native state on increasing the temperature (i.e. 
the thermodynamic stability) has been argued to be 
a characteristic of a good folder P,p| Jll| , p^ | . Here, 
we consider a different kind of stability against mu- 
tations of the sequence or equivalently perturbations 
in the effective interaction potential between amino 
acids. We demonstrate that the two types of stabil- 
ities are related in the sense that each one implies 
the other. We model the mechanism of evolution 
through natural selection in proteins and discuss its 
implication in the protein design problem. We show 
that the native states of random heteropolymers are 
not stable against mutations, whereas sequences de- 
signed to be thermodynamically stable are character- 
ized by a non-zero stability threshold. Conversely, 
an evolution- like design scheme that attempts to 
maximize the stability threshold is shown to lead to 
greater thermodynamic stability as well. Our work 
provides a characterization of the "twilight zone" 
and the observation that proteins form families ac- 



cordin g to the spatial conformation of their native 
states [Q. As suggested by Li et al. |jl5), structure 
selection is an appealing complementary view to se- 
quence selection. We show, however, that selection 
processes involving sequences could be as important 
as structure selection. The effects of destabilizing 
factors such as variations in the denaturant concen- 
tration and site directed mutagenesis on the kinetics 
of folding have been recently investigated in Refs. 
p6[ . A careful analysis of such perturbations has 
proven helpful in the clarification of the folding fun- 
nel in terms of collective coordinates [pPJlq] . 

We consider self-avoiding chains of N monomers 
on a 2D square lattice. The Hamiltonian is 



ff.(0 = E¥('i- r j) 1 



(1) 



where Hy is the coupling between monomer i and j 
and 5 is nonzero (and equal to 1) only if and Tj are 
are adjacent sites on the lattice and i and j are not 
next to each other in sequence. This hamiltonian is 
well known in protein modelling |^],[|. For a given 
sequence (and Bij) with N < 25, we enumerate 
the energies of all possible conformations and are 
able to determine the native state (ground state) 
conformation exactly. 

We consider two versions of the model. In the first 
(which we will call the model), the N monomers 
are assumed to be distinct. The Bi^ matrix is sym- 
metric and has N(N + l)/2 elements. In order to 
obtain a random heteropolymer, these elements are 
drawn from a Gaussian distribution with mean value 
-2 and variance 1. Effectively, the matrix B repre- 
sents a certain sequence. The model is identical to 
that studied by Dinner et al. P,0,fL9[ . The random 
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contact energies are in approximate correspondence 
to a more realistic parametrization of the contact 
energies by Miyazawa and Jernigan |^0|,|lJ or by 
Kolinski, Godzik and Skolnik p2[ . In order to model 
evolved sequences with a large stability gap 0], we 
follow the rank-ordered procedure outlined by Shri- 
vastava et al. [^3| of shuffling the By entries to as- 
sign the most favorable attractive interactions to the 
native contacts of the maximally compact native fold 
chosen as fixed target. 

In the second model (denoted as the MJ model), 
each monomer is chosen to represent one of the 
twenty amino acids with the interactions determined 
by Miyazawa and Jernigan 20 2^). A random se- 
quence would correspond to a random choice of the 
amino acids. In order to mimic the relevant feature 
of sequences selected by evolution it is no longer pos- 
sible to follow the rank ordering procedure because 
the Bi j entries cannot be shuffled at will. Instead, 
after having fixed a target fold Jl^|, we have used 
a recently proposed protein design procedure Q 
entailing an optimization scheme in sequence space 
which allows one to obtain sequences with a desired 
native state conformation and a required measure of 
thermodynamic stability, enforced by fixing a "de- 
sign temperature" Td and selecting those sequences 
with Tf > Td- 

Our calculations begin with the selection of two 
distinct interaction matrices which we shall call B 
and C. We shall consider 4 choices for B: the ran- 
dom and the evolved By and MJ models. C is 
chosen randomly. The ground states of the B and 
C sequences are generally distinct. We now con- 
sider mutations of the sequence along a trajectory 
parametrized by a mixing coefficient a 6 [0, 1] that 
changes the interaction matrix from B to C [fl7[: 



B a = (1 - a)B + aC. 



of sequences X and Y respectively. Note that A has 
been normalized so that it is 1 when a = 1, as long 
as the ground states of B and C are distinct. 

Our primary probe of the stability to mutations is 
via a study of the dependence of A on a. Qualita- 
tively similar trends are found for both the models 
- the signature of the selection in sequence space is 
in the quite distinct behavior of random and evolved 
sequences. A summary of our results for the behav- 
ior of the average A as a function of a for N = 16 
is shown in Fig. 1. The curves have been obtained 
as an average over 1000 realizations of independently 
chosen B and for each of them over 1000 realizations 
of C for the By model and over 10 realizations of 
B and for each of them over 1000 realizations of C 
for the MJ model. The average stability threshold is 
zero for random heteropolymers [p4[ and is distinctly 
non-zero for the evolved cases. 

Furthermore, the stability threshold goes up with 
the overall thermodynamic stability - in Figure 1 , for 
the MJ model, the region of stability against muta- 
tions increases with the design temperature Td. The 
threshold is somewhat reduced but is clearly non- 
zero when one considers rank ordered Bi j sequences 
that have native states in conformations that are not 
maximally compact. One increasing N, the number 
of monomers, comprising the evolved sequences, the 
stable phase along the a-axis increases in size along 
with a sharpening of the A — a curve suggestive of a 
sharp phase transition at the onset of instability in 
the thermodynamic limit. 

One may also define an individual stability thresh- 
old a t (B,C) in the strength a of the perturbation 
above which A becomes non-zero for the first time - 
the native structure of sequence B is destabilized. 
Normalized probability distributions P{at) of the 
individual stability thresholds for the random and 
(2) evolved By models are shown in Fig. 2. They 



The coefficient a is a measure of the distance in se- 
quence space between B and B a . This is a quite 
general perturbation of which the natural occur- 
ring ones are a subset. The structural similarity 
of the ground state conformations of these two se- 
quences is given by the normalized distance A(a) = 
d(B a , B)/d(C, B), where the distance d(X,Y) is de- 
fined by 



d(X,Y) 



N 



r >. .)2 



(3) 



where ry and r[ • are the Euclidean distances be- 
tween amino acids i and j in the the two native states 



underscore the different behaviors in the two cases. 
Our results, in the random case, are marginally re- 
lated to a study of Bryngelson [£5[ and in the evolved 
case, to a recent study of Pande et al. pj, where the 
authors addressed the issue of stability of the ground 
state against inaccuracies in the potential. Bryngel- 
son p5[ used a mean field theory to estimate the 
probability of predicting the correct structure of a 
sequence of monomers if the interatomic potential 
is known only to an accuracy of r/. A non-zero r\ 
could arise from variations in the solvent properties 
or due to the imperfect parametrization or determi- 
nation of the potential between amino acids or, as 
in our case, from mutations in the sequences. Pande 
et al. H] showed analytically that the ground state 
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of designed sequences is significantly robust against 
the introduction of random noise in the interaction 
matrix. 

Imperfectly folded proteins within the cell are de- 
molished by proteolytic enzymes [ p6| . Unfolded pro- 
teins can be present either as a product of a desta- 
bilizing mutation or due to a variation of the solvent 
properties. Thus the folded structure must be robust 
against perturbations in order to survive in the cell. 
The role of sequence selection is to produce robust 
sequences which rapidly fold into a target confor- 
mation with a specific function. Can one develop a 
design scheme starting from existing functional se- 
quences and produce artificial homologues with bet- 
ter functionality? 

The design scheme works as follows. 1) Select an 
initial random sequence. 2) Compute its MJ matrix 
B and its stability threshold by extracting a set of 
100 realizations of the perturbation C. 3) Subject 
the sequence to Monte Carlo optimization proce- 
dure: monomers are swapped and the new sequence 
is accepted if its stability threshold is increased. 4) 
After 1000 such Monte Carlo steps, stop the opti- 
mization and compute the folding transition tem- 
perature Tf of the resulting sequence. As shown in 
Fig. 3, Tf averaged over sequences correlates well 
with the threshold. 

We turn now to a recent study of Li et al. ||l5|| who 
suggested that certain highly designable structures 
that are the unique native states of a large number 
of sequences are special in that they are thermo- 
dynamically more stable than other structures and 
are stable against mutations in the sequence. Their 
study was of chains comprising 27 monomers made 
up of two kinds of amino acids, hydrophobic (H) and 
polar (P), on a simple cubic lattice. The bare values 
of their interaction parameters were expressed (af- 
ter scaling and shifting) in convenient units so that 
E(P,P)=0, E(H,P)=-1 and E(H,H)=-2.3. Our own 
studies of the identical model (without an overall 
attractive shift in the bare interaction parameters 
that would promote maximally compact structures) 
necessitate the consideration of non-maximally com- 
pact conformations as well. We have studied a two 
dimensional version of the model with the bare un- 
shifted interaction parameters considered by Li et 
al. and with 16 monomers. As suggested by them, 
we find that the conformations come with varying 
degrees of designability. There are many conforma- 
tions that only a few sequences have as a unique 
native state while there are few that are the native 
states of a large number of sequences. Fig. 4 shows 
a plot of the thermodynamic stability as measured 



by a Z score [p7j (see figure caption for a defini- 
tion) versus the number of sequences that design the 
structure. The three curves correspond to the high- 
est, the mean and the lowest Z score. There is no 
evidence of a jump in the thermodynamic stability 
beyond a certain value of N s as found by Li et al. 
for the case in which all native states are necessarily 
maximally compact. Indeed, for a given structure, 
there are variations in the stability on tuning the se- 
quences (as in the MJ curves in Fig. 1). Strikingly, 
highly designable conformations in 2D are always 
maximally compact. In 3D, with the same choice 
of unshifted interaction parameters, while globular 
structures with no holes, resembling real protein 
structures, are generally the ground state of HP se- 
quences, these structures are not necessarily maxi- 
mally compact. These findings show that, at least 
for interactions that do not always lead to maxi- 
mally compact native states, the selection process 
primarily involves the sequences and not the struc- 
ture. The present study has concentrated on the 
thermodynamic effects of mutations. The two-way 
link between resistance against mutations and ther- 
modynamic stability demonstrated here could also 
have ramifications on the dynamics of folding, for 
example, in the key role that specific amino acids 
have in nucleation mechanisms of folding Jl6| , p8| . 
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FIG. 1. Average structural similarity A as a function 
of the perturbation a. The non evolved and evolved cases 
for both the models used are shown. For the Bi,j model 
the averages have been taken over 1000 realizations of 
B and for each of them over 1000 realizations of C. For 
the MJ model we averaged over 10 realizations of B and 
for each of them over 1000 realizations of C. In our 
design procedure for the MJ model, only B sequences 
with a folding transition temperature Tf greater than 
the indicated value of the design temperature Td (0.0,1.0, 
and 1.2, respectively) were considered. Tf > indicates 
no selection. 

FIG. 2. The normalized probability distributions 
P(a t ) of the individual stability thresholds at for the 
non evolved and evolved Bij case. The statistics are the 
same as in Fig. 1. 

FIG. 3. Folding transition temperature Tf as a func- 
tion of the stability threshold a in the MJ model. The 
sequences have been designed through threshold opti- 
mization. 

FIG. 4. The Z score plotted against N s for the 2D 
version of the 3D model considered by Li et al. Jl5| . The 
Z score is the difference between the average energy (E) 
of the compact conformations and the ground state en- 
ergy, E, scaled by the dispersion a, Z = ((E)) — E/a. 
(E) and a were calculated as averages over all conforma- 
tions with 7 or more contacts, which are those competing 
to be the native state. Maximally compact conforma- 
tions have nine contacts. The Z score is a measure of 
the thermodynamic stability of the native state. Sim- 
ilar trends are observed on considering the energy gap 
instead of the Z score. 
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* Bij evolved 

* MJ Tf > 0.0 
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